Predicting survival and neurological outcome in out-of-hospital cardiac arrest using machine learning: the SCARS model

Summary Background A prediction model that estimates survival and neurological outcome in out-of-hospital cardiac arrest patients has the potential to improve clinical management in emergency rooms. Methods We used the Swedish Registry for Cardiopulmonary Resuscitation to study all out-of-hospital cardiac arrest (OHCA) cases in Sweden from 2010 to 2020. We had 393 candidate predictors describing the circumstances at cardiac arrest, critical time intervals, patient demographics, initial presentation, spatiotemporal data, socioeconomic status, medications, and comorbidities before arrest. To develop, evaluate and test an array of prediction models, we created stratified (on the outcome measure) random samples of our study population. We created a training set (60% of data), evaluation set (20% of data), and test set (20% of data). We assessed the 30-day survival and cerebral performance category (CPC) score at discharge using several machine learning frameworks with hyperparameter tuning. Parsimonious models with the top 1 to 20 strongest predictors were tested. We calibrated the decision threshold to assess the cut-off yielding 95% sensitivity for survival. The final model was deployed as a web application. Findings We included 55,615 cases of OHCA. Initial presentation, prehospital interventions, and critical time intervals variables were the most important. At a sensitivity of 95%, specificity was 89%, positive predictive value 52%, and negative predictive value 99% in test data to predict 30-day survival. The area under the receiver characteristic curve was 0.97 in test data using all 393 predictors or only the ten most important predictors. The final model showed excellent calibration. The web application allowed for near-instantaneous survival calculations. Interpretation Thirty-day survival and neurological outcome in OHCA can rapidly and reliably be estimated during ongoing cardiopulmonary resuscitation in the emergency room using a machine learning model incorporating widely available variables. Funding 10.13039/501100004359Swedish Research Council (2019–02019); Swedish state under the agreement between the Swedish government, and the county councils (ALFGBG-971482); The 10.13039/501100017018Wallenberg Centre for Molecular and Translational Medicine.

Methods We used the Swedish Registry for Cardiopulmonary Resuscitation to study all out-of-hospital cardiac arrest (OHCA) cases in Sweden from 2010 to 2020. We had 393 candidate predictors describing the circumstances at cardiac arrest, critical time intervals, patient demographics, initial presentation, spatiotemporal data, socioeconomic status, medications, and comorbidities before arrest. To develop, evaluate and test an array of prediction models, we created stratified (on the outcome measure) random samples of our study population. We created a training set (60% of data), evaluation set (20% of data), and test set (20% of data).We assessed the 30-day survival and cerebral performance category (CPC) score at discharge using several machine learning frameworks with hyperparameter tuning. Parsimonious models with the top 1 to 20 strongest predictors were tested. We calibrated the decision threshold to assess the cut-off yielding 95% sensitivity for survival. The final model was deployed as a web application.
Findings We included 55,615 cases of OHCA. Initial presentation, prehospital interventions, and critical time intervals variables were the most important. At a sensitivity of 95%, specificity was 89%, positive predictive value 52%, and negative predictive value 99% in test data to predict 30-day survival. The area under the receiver characteristic curve was 0.97 in test data using all 393 predictors or only the ten most important predictors. The final model showed excellent calibration. The web application allowed for near-instantaneous survival calculations.
Interpretation Thirty-day survival and neurological outcome in OHCA can rapidly and reliably be estimated during ongoing cardiopulmonary resuscitation in the emergency room using a machine learning model incorporating widely available variables.
Funding Swedish Research Council (2019-02019); Swedish state under the agreement between the Swedish government, and the county councils (ALFGBG-971482); The Wallenberg Centre for Molecular and Translational Medicine.

Introduction
Out-of-hospital cardiac arrest (OHCA) is common and associated with poor survival rates of 5-10%. According to the EuReCa TWO study, 1 Cardiopulmonary resuscitation (CPR) is started in approximately 68% of OHCA cases confirmed by the emergency medical service (EMS). Among those who received CPR, resuscitation efforts were terminated prehospitally in 64% of the cases, while the remaining 36% were transported to a hospital, and among those admitted, roughly one in four survived. Overall survival in OHCA was 8%. Rephrased, most OHCA cases transported to hospital die despite prehospital and in-hospital resuscitation efforts.
While chest compressions are effective even in the hands of laymen, 2 the decision to terminate CPR should lie in the hands of experienced physicians and EMS personnel, after careful consideration of the likelihood of survival and neurological prognosis. Regrettably, humans are at best poor at probabilistic calculations. 3,4 Given the discrepancy between rates of return of spontaneous circulation (ROSC) and 30-days survivalwhich may differ up to 10-foldit is desirable to develop a clinical prediction model for prediction in the emergency department (ED). 5 We used the Swedish Registry for Cardiopulmonary Resuscitation (SRCR) to develop a machine learning derived clinical prediction model using 393 candidate predictors in 55,615 cases of OHCA. Feature selection and model specification was entirely data-driven. The final model was deployed as a web application and required less than 30 s to obtain real-time predictions using the machine learning model. The model was developed using variables available no later than patient arrival in the ED, which marks the potential point of use.

Study population
We used the SRCR to include all cases of OHCA from 2010 to 2020. The registry has been described previously. 2,6 For a description of the Swedish EMS system as well as the in-hospital organization, refer to supplementary discussion 1. The SRCR is a nationwide quality registry and was launched in 1990. Since 2008, all ambulance organizations in Sweden reported OHCA cases to the registry. Data reporting to the SRCR follows the Utstein style. We included all OHCA patients aged 18 years and older, where resuscitation was attempted from Jan 1st, 2010 to Dec 31st, 2020.
An OHCA is defined as a cardiac arrest occurring in non-hospitalized patients. The first recorded rhythm is defined as either shockable (ventricular fibrillation or pulseless ventricular tachycardia (VF/pVT) or nonshockable pulseless electrical activity (PEA) or asystole). Time delays from collapse to the emergency call, start of CPR, delivery of defibrillations, dispatch of EMS, and the arrival of EMS at the scene are reported. Additional variable details are provided below. Information regarding DNAR orders is not recorded in the SRCR and therefore not included in the study.

Data merger
We merged the SRCR with the Swedish Inpatient and Outpatient Registry. The Inpatient Registry contains all inpatient records since 1987 and has been validated. 7 The Outpatient Registry contains all outpatient clinic visits since 2002. The primary and up to 20 secondary diagnoses are available in each registry. Diagnoses are classified using the International Classification of Disease (ICD) versions 9 and 10, with data retrieved from 1987 for the Inpatient Registry and 2002 in the

Research in context
Evidence before this study While up to 30% of all cases of out-of-hospital cardiac arrest (OHCA) achieve return of spontaneous circulation (ROSC), only 10% survive, such that a substantial proportion undergo difficult and potentially long intensive care with death as the outcome. OHCA is a leading cause of death and the cost of OHCA is staggering at all levels. Predicting survival and neurological function in OHCA is very difficult for physicians, and there are few tools to aid the decision process.
Added value of this study Using over 55,615 cases of OHCA, for whom we had almost 400 predictors of survival, we have developed SCARS-1, an operational machine learning model that enables clinicians to calculate survival and neurological function in <15 s with an area under the receiver operating curve (AUROC) of 0.95, with excellent calibration. Additionally, we developed the SCARS-1 web application (freely available for download) that enables intuitive and easy access to the machine learning model.
Implications of all the available evidence OHCA carries a poor prognosis and a majority of OHCA patients transported to the emergency department without ROSC do not survive. The SCARS-1 prediction model and web application can assist the clinician in the ED by offering a robust prediction of survival and neurological function within <15 s.

Articles
Outpatient Registry. The categorization in the SRCR of causes of OHCA has changed over time, both categorizations are reported.
Medications were retrieved from the Swedish Prescribed Drug Registry, including all prescriptions registered and expedited since 2005. In addition, we retrieved prescriptions registered from Jan 1st, 2008, according to the Anatomical Therapeutic Chemical (ATC) classes.
The LISA (integrated longitudinal database for health insurance and labor market studies) database was used to obtain socioeconomic data, e.g., income, education, country of birth, housing conditions, etc. We only used data recorded during the year before cardiac arrest to avoid reverse causation between socioeconomic status and cardiac arrest (e.g., income may be dramatically reduced after cardiac arrest).

Candidate predictors
After merging the above-mentioned data sources, we had 393 predictors. These included seven predictors describing the circumstances of the cardiac arrest, six critical time intervals during the cardiac arrest, four predictors describing patient demographics, six predictors describing initial presentation at EMS arrival two geographical predictors, nine predictors on spatiotemporal data during arrest, 3 predictors on socioeconomic status, 18 medication classes, 328 comorbidities from the Swedish in-and outpatient registry. For a description of included variables relating to comorbidities and medications, see Table S1. For a flow chart of included variables, refer to Fig. S1.

Outcome measures
The primary outcome measure was survival at 30 days. The secondary outcome measure was neurological function measured using cerebral performance category (CPC) score at discharge (1, no or mild sequelae; 2, moderate sequelae; 3, severe sequelae; 4, vegetative state; 5, brain dead).

Role of the funding source
The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.

Ethics
The study complies with the declaration of Helsinki and was approved by the Swedish Ethical Review Authority (No 2020-02017). The need for informed consent was waived by the Ethical Review Authority due to the retrospective nature of the study.

Descriptive statistics
Patient characteristics are described using frequencies and relative frequencies. No inferences are made from baseline data, representing the entire population with OHCA in Sweden during the period. We only present the 15 most common coexisting conditions and medications.

Machine learning Model training and testing
To develop, evaluate and test an array of prediction models, we created stratified (on the outcome measure) random samples of our study population. We created a training set (60% of data), evaluation set (20% of data), and test set (20% of data). We used the whole study population to split into train, evaluation and test sets. For a description of software refer to supplemental discussion 2.

Model evaluation and comparison
We used 5-fold cross-validation, repeated five times to compare the competing models. Comparisons across models were made using the area under the receiver operating characteristic (ROC) curve.

Addressing imbalance
For binary classification problems, the relative frequencies of the classes can have a significant impact on model performance. 8 An imbalance is evident in the SRCR since approximately 10% of all patients survive. We addressed imbalance by down-sampling the number of deceased individuals, such that for one survivor, we included four deceased individuals, which reduced class imbalance. Only the training data set was artificially balanced. The evaluation and test sets were unaltered to assess model performance on representative data.

Candidate model frameworks
We initially considered logistic regression, support vector machines, neural networks, gradient boosting, extreme gradient boosting, and random forest as candidate prediction models. Frameworks requiring hyperparameter tuning (gradient boosting, extreme gradient boosting, random forest, SVM, neural networks) were tuned using manual grid. Comparisons were made using the area under the ROC curve (AUC).
Extreme gradient boosting (XGB) outperformed all models. We tuned the XGB model using a grid of 144 different hyperparameter combinations (Fig. S2). Still, we noted additional improvement in AUC was possible and computed 14 additional models, which showed no further improvement. During hyperparameter tuning, we tuned eta (step size shrinkage), gamma (minimum loss reduction required to further partition) on a leaf node, rounds (number of trees grown), max depth (tree depth), column sample by tree (proportion of columns sampled for each tree), min child weight (minimum sum of instance weight needed in a child) and subsample (proportion of patients selected for each tree) (

Tuning of decision threshold
We derived new cut-offs for the classification (decision) threshold using the evaluation set. The conventional 50% probability threshold may be suboptimal for the current prediction task. An optimal prediction model would have 100% sensitivity and 100% specificity for predicting survival. Low sensitivity could theoretically result in patient harm since the model may predict death in a patient who could survive. We, therefore, tuned our threshold to maximize sensitivity for survival. The threshold was calibrated using the ROC curve. This was done by maximizing the Youden index, which measures the proportion of correctly predicted samples for both the survival and deceased groups. The Youden index can be computed for each cut-off on the ROC curve, the cut-off that maximizes the Youden index represents the optimal model. In addition, we also identified the cut-off that yielded a sensitivity of 95%.

Relative predictor importance
Because the model included 393 predictors, ranging from coexisting conditions to prehospital interventions performed by the EMS, we calculated the relative importance of all predictors. This clarifies which predictors are the most important for determining survival in OHCA. The top 20 predictors were then used to create and evaluate parsimonious models (XGB) with 1-20, in order of importance, predictors. The purpose of this was to determine how many predictors were required to achieve an acceptable precision to deploy a clinically feasible web application.

Calibration
Model calibration was evaluated by comparing observed probabilities versus predicted probabilities.

Missingness
We imputed missing values using a non-parametric method combining random forest with predictive mean matching. 9,10 For a description of the degree of missingness for the to 20 predictors, see Table S3.

Web application
The final model was deployed as a web application (Fig. S5).

Role of the funding source
The funder of the study had no role in study design, data collection, analysis, interpretation, writing of the paper or decision to submit for publication.

Results
A total of 55,615 cases of OHCA were recorded in the SRCR from 2010 to 2020. Table 1 shows the baseline characteristics of the study population (Table S4

Data sets, features, and hyperparameter tuning
A total of 393 predictors were included in the models (using the same set of features for all models). All features not included in the baseline tables are described in Table S1.    Table S2.

Model selection
The best model encompassed 1400 trees. The maximum tree depth was 10, shrinkage was 0.01, gamma (minimum loss reduction) was 0, column samples for each tree were 0.8, and the minimum sum of instance weight needed in a child 1, and subsample ratio was 0.7. The difference between various models was negligible. The performance of the final model in the evaluation set was AUC 0.97. With the standard decision threshold of 50% (for classification of dead vs. alive), the sensitivity was 0.82, specificity 0.97, PPV 0.80, and NPV 0.98. Fig. 1a shows the relative importance of the top 50 predictors (among all 393). Variables that were available in the SRCR dominated in terms of importance. Among variables not specific to the SRCR, only calendar year and age were among the top predictors. Variables concerning the initial presentation, age, use of adrenaline, and critical time intervals were the most important predictors. Coexisting conditions were much less important, as were medications, socioeconomic status, and circumstances around the cardiac arrest. Fig. 1b shows all predictors collapsed into categories of predictors. This figure shows that the emergency department (ED) presentation, prehospital interventions, and critical time intervals dominated the importance.

Best predictors
Refitting the final model with the top 1 to 20 predictors, in sequential order, shows that the AUC values increased rapidly when adding the first few top predictors. Still, little AUC was gained after including additional predictors (Fig. 2). There was no difference between the top 10 and 393 predictors regarding AUC (p = 0.20). The top 20 predictors were related to the initial presentation, prehospital interventions, and critical time intervals (i.e., data available to the physicians in the ED). The model performance on training data and test data regarding CPC-score can be seen in Fig. S3.  Tuning the decision threshold: evaluation data set Using the standard decision threshold (50% cut-off) sensitivity was 77%, specificity 98%, positive predictive value (PPV) 86%, negative predictive value (NPV) 97%. The Youden index suggested a decision threshold at 7.9%, which yielded a sensitivity of 92%, specificity of 92%, a PPV of 60%, NPV of 99%. To achieve a sensitivity of 95%, the decision threshold was set at 4.6% probability of survival, resulting in sensitivity 95%, specificity 89%, PPV 52%, and NPV of 99%. Refer to Fig. 3 for AUC-ROC on evaluation data and Table S5 for details.

Calibration
The calibration plot is presented in Fig. 4. The final model showed excellent calibration across the spectrum of survival probabilities.

Web application
The web application is available at https://gocares.se. Refer to Fig. S4 for a step-by-step tutorial on the use of the application with 3 example patients.

Discussion
Rapid and accurate probability calculations are critical in the setting an out-of-hospital cardiac arrest. The probability of survival, selection of interventions, likely underlying cause, etc., are essential information needed in seconds. Good prediction models can save lives and reduce suffering and costs by avoiding futile resuscitation efforts. Death due to cardiovascular causes is a public health issue associated with more deaths than any cancer. 11 Out-of-hospital cardiac arrest is managed by ambulance and hospital staff with varying knowledge and experience. We have developed a clinical prediction model with good performance using a data-driven machine learning approach. It utilizes predictors that can be collected in most health care settings. Our prediction model was derived using 393 candidate predictors in 55,615 cases of OHCA to predict 30day survival. The top 20 predictors were related to the initial presentation, prehospital interventions, and critical time intervals, i.e., readily available to the physicians in the ED. This is a fundamental aspect of the model; limiting the variable inclusion at the time of arrival to the ED results in a prediction model that can be used immediately upon ambulance arrival to the hospital. Moreover, the model is dominated by relatively few predictors, making it possible to predict with very high precision using only 5 to 10 predictors.
The developed web application uses the final prediction model to calculate survival, and entering all variables for one patient takes approximately 30 s, which highlights the clinical potential for such models.
A few similar studies merit further discussion. Seki et al. 12 developed a machine learning algorithm for oneyear survival following OHCA using 53 predictors. The Articles model showed good performance (AUC 0.94). Still, the usefulness of the prediction tool was limited since only OHCA of presumed cardiac etiology were included, and it is improbable that any clinician will be able to enter data on 53 variables during ongoing CPR. Kwon et al. 13 used multiple machine learning methods in addition to logistic regression models to predict survival following OHCA among patients that achieved ROSC. Although the model showed high AUC (0.95) by only including patients with ROSC, the usefulness of such a score in the ED is limited since most patients with sustained ROSC are transferred to the ICU for further treatment, and prognostication is often made at a later stage. The same limitation applies to several other prediction tools (Miracle2, 14 OHCA, 15 TTM, 16 CAHP 17 )they can be used to predict neurological function among patients with OHCA and ROSC, but they do not help the clinician with a patient with OHCA and no ROSC in the ED. We present the only prediction model for OHCA with an AUC over 0.95 using only readily available variables.
The decision to continue or terminate resuscitation should be made based on all available information: assessing medical futility, defining an unfavorable outcome, and incorporating the patients' views on the quality of life (when known) and what to the patient considered a life worth living. Medical futility has been defined as having less than a one percent chance of survival. 18 This definition has received criticism for not considering functional outcomes and being biased towards some socioeconomic, demographic, and cultural factors and groups. Relying on one prediction tool comes with the risk of self-fulfillment, and existing ethical guidelines recommend taking multiple factors into account. 5 This model hereafter referred to as SCARS (Swedish Cardiac Arrest Risk Score), estimates the chance of survival and neurological function based on an entire nation's collected data and can be used as an adjunct to other available information. The resuscitation team can incorporate the estimated survival probability with less quantifiable data, such as frailty, the patient's personal views, quality of life, and previously stated opinions and wills, to make an informed decision to continue or terminate resuscitation. The SCARS model does not suggest whether or not CPR should be discontinued; it merely calculates the probability of survival. We recommend that the decision to withhold or withdraw resuscitation be context-specific and note that all tools intervening in patient management must be evaluated in clinical trials before clinical implementation.
Most cardiac arrests occur in the out-of-hospital setting, and only 5-10% of the patients in whom resuscitation is initiated survive. 1 If the OHCAs where resuscitation was never initiated are included, the number of survivors is below 5%; i.e., in more than 95% of OHCA cases, the decision to withhold or withdraw resuscitation is made. There is a significant variation in prehospital Termination of Resuscitation (TOR) between EMS systems, 19,20 ranging from 0% to over 50% prehospital TOR. This variability is likely due to different policies and practices regarding the termination of resuscitation and not due to medical reasons. The use of a validated and reliable prehospital and emergency department prediction tool could result in patients being treated more uniformly regardless of location.
Conflicting evidence exists as to whether experienced clinicians are better at predicting survivors from nonsurvivors than existing scoring systems (APACHE, SAPS, MPM). 21 ICU physicians were better than existing scoring systems when assessing survival 24 h post ICU admission. Still, both the physicians and the scoring systems were only moderately accurate at outcome prediction. 22 Data suggests that estimates of prognosis are often based on experience and clinical reasoning skills rather than objective information, 23,24 which can sometimes be misleading. Data from physician estimates of prognosis in terminally ill patients showed that physician estimates of time to death were only correct 20% of the time (within 33% of actual survival), overoptimistic predictions (63%) were common and overall physicians overestimated survival time by a factor of 5. 23,25 To get an approximation of how difficult estimation of survival probability can be, we asked 17 physicians working in the ICU and CCU (unpublished data) to estimate the chance of survival in four different OHCA scenarios. The clinicians' estimates deviated markedly (up to a factor 3) from crude and predicted survival probabilities, suggesting that prediction tools are needed.
A strength of the present study is the extensive, representative data set, the many included parameters, and the large number of models evaluated. The SCARS prediction tool reliably predicts survival following OHCA, and the benefits of a well-functioning and reliable tool for patients, health care workers, and society are potentially of great importance. In addition, future research should prospectively and externally validate the prediction tool using data from other nationwide out-ofhospital cardiac arrest registries.

Limitations
Our prediction model cannot be used in the prehospital setting since it includes two predictors recorded at hospital arrival (consciousness and circulation on hospital arrival). Our model should neither be used in patients with ROSC or consciousness since they should always be admitted to the appropriate level of care. However, including these predictors is critical since they were the strongest survival predictors, and they are readily available to the clinician in the ER. Our prediction model has not been validated in other countries and should be used with caution elsewhere. However, the SRCR includes over 95% of all cases of OHCA in Sweden, allowing us to validate the model using data available to us. While we did try neural networks, random forest, and several other machine learning frameworks, we did not perform an exhaustive hyperparameter tuning. The selection of extreme gradient boosting was based on the fact that it outperformed all other models in the initial grid searches. A vast body of science has demonstrated this framework to outperform all others when modeling structured data generally. As can be seen in Table S3, there was a relatively high degree of missingness relating to some variables. By using multiple imputations and comparing imputed with original data, we conclude that the missing values were unlikely to impact the overall results.

Conclusion
The SCARS prediction model can reliably and rapidly estimate the likelihood of survival using five to ten readily available variables such as ROSC, consciousness in the ED, use of adrenaline, initial presentation, and age. The model can provide clinicians with critical information during ongoing CPR in the ED.